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Abstract 

A new matching method is proposed for the estimation of the average treatment 
effect of social policy interventions (e.g., training programs or health care measures). 
Given an outcome variable, a treatment and a set of pre-treatment covariates, the 
method is based on the examination of random recursive partitions of the space of 
covariates using regression trees. A regression tree is grown either on the treated or 
on the untreated individuals only using as response variable a random permutation 
of the indexes l...n (n being the number of units involved), while the indexes 
for the other group are predicted using this tree. The procedure is replicated in 
order to rule out the effect of specific permutations. The average treatment effect 
is estimated in each tree by matching treated and untreated in the same terminal 
nodes. The final estimator of the average treatment effect is obtained by averaging 
on all the trees grown. The method does not require any specific model assumption 
apart from the tree's complexity, which does not affect the estimator though. We 
show that this method is either an instrument to check whether two samples can 
be matched (by any method) and, when this is feasible, to obtain reliable estimates 
of the average treatment effect. We further propose a graphical tool to inspect the 
quality of the match. The method has been applied to the National Supported 
Work Demonstration data, previously analyzed by Lalonde (1986) and others. 
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1 Introduction 



A wide category of estimators has been developed in the last decade to evaluate the ef- 
fects of medical, epidemiological and social policy interventions (for instance, a training 
program) over the individuals (for a survey on methods and applications see Rosenbaum, 
1995 and Rubin, 2003). Matching estimators represent a relevant class in this category: 
these estimators aim to combine (match) individuals who have been subject to the inter- 
vention (so forth, treated) and individuals with similar pre-treatment characteristics who 
have not been exposed to it (untreated or controls), in order to estimate the effect of the 
external intervention as the difference in the value of an outcome variable. 

From a technical viewpoint, a remarkable obstacle in obtaining a satisfying match is 
constituted by the "curse of dimensionality", i.e. by the dimension of the covariate set. 
This set should be as large as possible, in order to include all the relevant informations 
about the individuals (Heckman et al. 1997, 1998), but increasing the number of covariates 
makes the match a complicated task. 

Two main approaches have been developed to solve the matching problem through a 
one-dimensional measurement: the propensity score matching method (PSM) and match- 
ing method based on distances (DM). The PSM method makes use of the notion of 
propensity score, which is defined as the probability to be exposed to the treatment, con- 
ditional on the covariates. Treated and controls are matched on the basis of their scores 
according to different criteria: stratification, nearest neighbor, radius, kernel (see Smith 
and Todd, 2004a, for a review). The DM method makes use of specific distances (e.g. 
Mahalanobis) to match treated and untreated (see e.g. Rubin, 1980, Abadie and Imbens, 
2004). 

In this paper we propose a matching method based on the exploration of random 
partitions of the space of covariates. Treated and controls are considered similar, i.e. 
matched, when belonging to the same subset in a partition. As it will be clear in the 
following, our technique is unaffected by the dimensionality problem. 

More precisely, we make use of a non standard application of regression trees. The 
CART (classification and regression trees) methodology has been introduced in the lit- 
erature on matching estimators as one of the alternatives to parametric models in the 
assignment of propensity scores (see e.g. Rubin, 2003 and Ho et al, 2004) or to directly 
match individuals inside the tree (Stone et ai, 1995). In our approach the use of regres- 
sion trees is indeed different: our technique exploits the ability of trees to partition the 
space of covariates. 

The methodology proposed here is to grow a regression tree only on one group, for 
example the treated, in the following way: we assign each unit a progressive number 
(label) and use it as a response variable. We grow the tree till it has only one (or at most 
few) units in the terminal nodes (leaves). Then we use the tree to predict the labels for the 
units of the other group (in this example, the controls). Units (treated and controls) are 
matched if they are in the same leaf and only if the balancing property for their covariates 
is met. As the resulting tree (i.e. the partition of the covariate space) depends strongly 
on the initial assignment of the labels (see Section EJ), we operate several permutations of 
these labels and then take the average of all the results to get the final treatment effect 
estimate. 

To shed light on the properties of the method, we use the National Supported Work 
Demonstration (NSW) data coming from a training program originally analyzed by Lalonde 
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(1986) but also by a large number of studies aiming to test the performance of different 
methodologies of evaluation (Dehejia and Wahba, 1999, 2002; Becker and Ichino, 2002; 
Smith and Tood, 2004a, 2004b; Deheja 2004; Abadie and Imbens, 2004) 

This dataset is peculiar in its previous analyses pointed out that "the question is 
not which estimator is the best estimator always and everywhere" (Smith and Todd, 
2004a); the task of the investigation should be to provide a tool able to signal when the 
matching methods can be successful or, on the contrary, alternative methodologies ought 
to be applied. The procedure proposed in this paper is a possible candidate to perform 
the task; moreover, when matching can be applied, the estimators obtained with this 
procedure are either normally distributed, robust with respect to the complexity of the 
tree and capable to reduce the bias. 

The paper is organized as follows: in Section El we introduce the notation and define 
the object of estimation. Section |3]is devoted to a brief description of the CART method- 
ology which is functional to Section 01 where the idea behind the proposed technique is 
illustrated. In Sections El and |B1 we present in details the algorithmic flow of the procedure 
and our empirical results to show the above mentioned properties of the method. Section 
13 introduces a graphical tool which is useful to asses the quality of the match. Most of 
the tables and all the figures can be found at the end of the paper. 

2 The matching problem 

Matching estimators have been widely used in the estimation of the average treatment 
effect (ATE) of a binary treatment on a continuous scalar outcome. For individual i = 
1,...,N, let (YJ T , Yf) denote the two potential outcomes, Yf being the outcome of 
individual i when he is not exposed to the treatment and Y^ the outcome of individual i 
when he is exposed to the treatment to estimate Yf. For instance, the treatment may be 
participation in a job training program and the outcome may be the wage. If both Yf 
and Yj T were observable, then the effect of the treatment on % would be simply Y t T — Yf . 
The root of the problem is that only one of the two outcomes is observed, whilst the 
counterfactual is to be estimated. 

Generally, the object of interest in applications is the average treatment effect on the 
subpopulation of the N T treated subjects (ATT). Let r be the ATT, then r can be written 
as 



As said, the first problem in practice is to estimate the unobserved outcome, Yf for 
individual i who was exposed to the treatment. If the assignment to the treatment were 
random, then one would use the average outcome of some similar individuals who were 
not exposed to the treatment. This is the basic idea behind matching estimators. For each 
i, matching estimators impute to the missing outcome the average outcome of untreated 
individuals similar, up to the covariates, to the treated ones. 

To ensure that the matching estimators identify and consistently estimate the treat- 
ment effects of interest, we assume that: a) (unconfoundedness) assignment to treatment 
is independent of the outcomes, conditional on the covariates; b) (overlap) the probabil- 
ity of assignment is bounded away from zero and one (see Rosenbaum and Rubin, 1983; 
Abadie and Imbens, 2004, and references there in). 
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As mentioned in the Introduction, the problem with matching is that the number of 
covariates and their nature makes usually hard to provide an exact match between treated 
and control units. 

3 About classification and regression trees 

Classification and Regression Trees (CART) have been proposed by Breiman et al. (1984) 
as a classifier or predictor of a response variable (either qualitative or quantitative), con- 
ditionally on a set of observed covariates, which collects sample units in groups as much 
homogeneous as possible with respect to the response variable. The main assumption in 
CART is that the set of covariates X admits a partition and the tree is just a represen- 
tation of this partition. To our end this means that the space X is divided into cells or 
strata where we can match treated and control units. 

Ingredients for growing a tree are the space of covariates X, the response variable 
Y and a homogeneity criterion (e.g. the deviance or the Gini index). One of the most 
commonly used methods to grow a tree is the "one-step lookahead" tree construction with 
binary splits (see Clark and Pregibon, 1992). Given the set X we say that its partition 
can be represented as a tree or, more precisely, by its terminal nodes called leaves. Data 
are then subdivided in groups according to the partition of X. 

In the first step of the procedure the covariate space X is not partitioned and all the 
data are included in the root node. The root is then splitted with respect to one covariate 
Xj into two subsets such that Xj > x and Xj < x (in case Xj is a continuous variable, 
but similar methods are conceived for discrete, qualitative and ordered variables). The 
variable Xj is chosen among all the k covariates X\,X2, . . . ,Xk in such a way that the 
reduction in deviance inside each node is the maximum achievable. This procedure is 
iterated for each newly created node. The tree construction is stopped when a minimum 
number of observations per leaf is reached or when the additional reduction in deviance 
is considered too small. 

Any new observation can be classified (or its value predicted) by dropping it down 
through the tree. Note that even observations with missing values for some covariates can 
be classified this way. 

The use of CART to replace parametric models (probit or logit) to estimate the propen- 
sity score is not unknown to the literature (see e.g. Ho et al, 2004). CART is also used 
to directly match treated and controls units inside the leaves (see e.g. Stone et al, 1995, 
among the others). These approaches are used to solve the problem of model specifica- 
tion that is always difficult to justify in practice (see Dehejia, 2004, for an account on the 
sensitivity of the estimate due to model specification). 

CART technique is also seen as a variable selection method and it is rather useful 
compared to parametric modelling because interaction between variables and polynomial 
transforms are handled automatically. 
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4 Random partition of the space of covariates using 
CART 

In this section we present a new approach to the use of CART. As previously noticed, 
CART methodology generates partitions of the space X maximizing homogeneity inside 
the leaves with respect to the response variable. Our proposal is to build a tree only on 
the treated and to grow it up to a level of complexity sufficiently high, such that each 
terminal leaf contains at most few treated. This produces a partition of X that reflects 
the stratification structure of the treated. To this end, being tit the sample size of the 
treated, we assign to the treated a response variable which is a sequence of numbers from 
1 to Tlx- 

This tree is used to assign controls to leaves. Treated and controls belonging to the 
same leaf are then directly matched. Formally the ATT is estimated as follows: 
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where Wij = 1 if treated % and control j are matched in a leaf, otherwise = 0. T is 
the set of the indices for the treated, Cj = {j : — 1}, i 6 T and |C;| is the number of 
elements in set Cj. The variance of att can be directly calculated: 



Var(a?t) = (^) 2 g (var(if ) + g (jg^ " Var(>f ) j 



(2) 



being C the set of indices for the controls. Further, pose Wij/\Ci\ = if tu^ = by 
definition. Provided we are given a consistent estimator £lq(X) of fio(X) = E(Y C \X), we 
can also adjust the bias for the difference-in-covariates (see Abadie and Imbens, 2004) 
obtaining the following adjusted version of equation 

^ = ^ E ( ^ - - T^i E w ^ Y f - ( 3 ) 



and the variance of att is the following: 
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4.1 Generating random partitions 

The tree structure is strongly dependent on the assignment of the values (1, . . . , rir) of the 
response variable to the treated; so does the ATT estimate. In order to marginalize this 
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dependence, we replicate the tree construction randomly permuting the set (1, . . . , n T ) 
of the values of the response variable assigned to each treated unit 1 . The final estimator 
of the ATT will be the average of the ATT's obtained in each replication. The number 
of possible permutations is ny! which is also the maximum number of the significant 
partitions of the space X. 

At each replication, the balancing property on the covariates has to be tested inside 
each leaf. When this property is not met, the treated and control units involved are 
excluded from the matching. 

Moreover, to increase the number of matched treated in each replication, we generate 
a subsequence of trees to match the residual treated: first a tree is grown and matching 
is taken over. If not all the treated have been matched, we keep track of these and we 
grow another tree using a different permutation of (1, . . . ,n-r). At this second step we 
now match the remaining treated only. The procedure is iterated until either all the 
treated units have been matched or a prescribed maximum number of iterations has been 
reached. This can be done with or without replacement for the controls. So we have R 
replications and for each replication we estimate the ATT as in (PJ) (or Define 
the ATT estimator at k-th replication (k — 1, . . . , R), as follows: 
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The final ATT estimator is defined as the average over all the replications: 
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At each replication, the set of weights can be represented as a matrix of matches. 
Define the proximity matrix P as follows: 



k=l 



Pij 



E 

k=l 



w 
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(6) 



This matrix, examined in detail in Section [7[ contains information on the quality of the 
match between treated and controls. In fact, each row i of P tells how many times, over 
R replications, a treated unit i has been matched with each control. Even if this is not a 
distance matrix, it can be used as a starting point for calculating DM-estimators. 



4.2 Further improvements 

As usual in the applications, n T « ric- So it might happen that the tree grown up on 
the treated contains a high number of controls in each terminal leaf. This may cause the 
balancing property to fail in a relevant number of cases even in spite of a long subsequence 

^dn other direct matching estimators, the same argument on order dependency applies as the order 
chosen to match the individuals affects the final estimator of the ATT (see e.g. Abadie and Imbens, 2004, 
D'Agostino, 1998). 
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of iterations. This implies that a small number of treated are included in the matching 
and generates additional bias in the estimates. 

If this is the case, we propose to grow a tree only on the controls. One might expect 
to find a greater number of terminal leaves with a few controls and treated per leaf by 
construction. In such a tree, the balancing property should be met more frequently. 

On balanced samples both kind of approaches generate similar partitions. On the 
contrary, for unbalanced samples, this alternative procedure may be effective. 

5 Empirical results 

In this section we present an application of the above procedure analyzing, once again, 
the well know benchmark example of the NSW data from Lalonde (1986). In the view 
of reproducible research, all the examples, including data sets, software and scripts, are 
available at the web page http://www.economia.unimi.it/rtree, The software we 
use is the open source statistical environment called R which has recently increased its 
popularity also among econometricians (see for example, among the others, Kings' projects 
Zelig and Mat chit, also including specific matching routines; Kosuke et al, 2004; Ho et 
al, 2004). 

5.1 The data 

The National Supported Work (NSW) data comes from Lalondes (1986) seminal study on 
the comparison between experimental and non-experimental methods for the evaluation 
of causal effects. The data contain the results of a training program: the outcome of 
interest is the real earnings of workers in 1978, the treatment is the participation to the 
program. Control variables are age, years of education, two dummies for ethnic groups: 
black and hispanic, a dummy for the marital status, one dummy to register the posses 
of a high school degree, earnings in 1975 and earnings in 1974. The set contains 297 
individuals exposed to the treatment and 425 control units. This is an experimental 
sample, named LL through the text. In their 1999 and 2002 papers, Dehejia and Wahba 
select a subset of LL on the basis of 1974 earnings, restricting the sample to 185 treated 
and 260 controls. We call this sample DW. Further, Smith and Todd (2004a) suggest a 
more consistent selection on the basis of 1974 earnings of individuals from the LL sample, 
limiting the number of treated to 108 and to 142 for the controls. We refer to this set a ST. 
Along with these experimental samples, it is common practice to use a non experimental 
sample of 2490 controls coming from the Population Survey of Income Dynamics. These 
data, called PSID in the following, are used generally to prove the ability of PSM or 
DM methods in matching experimental and non-experimental data (see e.g. Dehejia and 
Wahba, 1999, 2002; Abadie and Imbens, 2004) or to show, on the contrary, that LL (and 
its sub-samples) are not comparable with PSID data (Smith and Todd, 2004a). In the 
following we call "naive target" the difference in mean of the outcome variable (earnings 
in 1978) of treated and controls in the experimental samples (LL, DW and ST). This is 
considered to be the benchmark value for the ATT. 
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5.2 The implementation of the random tree approach 

In our approach, trees are grown to their maximum complexity using the recursive parti- 
tion method as suggested by Breiman et al. (1984). The only varying parameter is the 
minimum split (just split in the tables), which is the minimum number of observations 
that must be included in a leaf for a split to be attempted (this means that with a split 
equal to 2, in absence of ties, the resulting tree has one observation per leaf). Like in 
Abadie and Imbens (2004), there is only one parameter the user needs to choose during 
the analysis and yet the estimates do not show particular sensitivity with respect to it. 
We made 250 replications, i.e. we search for 250 different partitions of space of the co- 
variates, allowing for a maximum number of 50 iterations per replications (which means a 
maximum of 250*50 =12500 trees grown; for the actual number of trees grown see Table 
[Q. A small number of iterations per replication means that matching is easily attainable. 



Trees grown on the treated Trees grown on the controls 



split 


LL 


DW 


ST 


DWvsPSID 


LL 


DW 


ST 


DWvsPSID 


50 


5 


4 


2 


43 


6 


4 


2 


42 


32 


6 


5 


4 


49 


7 


5 


5 


39 


20 


10 


6 


5 


50 


8 


7 


6 


35 


16 


15 


12 


9 


50 


10 


8 


7 


36 


8 


47 


45 


31 


50 


15 


14 


11 


37 


4 


50 


50 


50 


50 


17 


17 


8 


41 


2 


50 


50 


50 


50 


45 


43 


37 


50 



Table 1: Average number of iterations per replication. In every sample studied, the average 
number of trees grown varies from 2*250=500 to 50*250=12500, being 250 the number of repli- 
cations. The higher the number of iterations the harder the matching. For matchable samples 
the number of iterations increases with the split. 

Contrary to some DM and PSM implementations in the literature, we still check for 
the balancing property in each leaf of the trees using a t test for quantitative covariates 
and a x 2 test f° r qualitative or dicotomus variables using a significance level of 0.005. The 
difference-in-covariates correction is applied using a standard regression tree to estimate 
l^o(X) (see Section EJ). 

5.3 Results on experimental data 

In Tables |2] and El we present the results for the experimental samples LL, DW and ST. 
From the empirical analysis in Table El it emerges that ATT estimators f and f are both 
close to the target and stable (i.e. non sensitive to the split parameter) in all the samples. 
The estimated standard deviation of the population in equation (0) is denoted by fr T 
(and <5y respectively) and is also close to the target (in this case the pooled standard 
deviation). Table El and 01 also report the following informations: the mean number of 
treated and controls matched per replication, the number of trees in 250 replications that 
matched more than 95% of the treated units in the sample ("% o.t." in the tables, namely 
"over the 95% threshold"), the 95% Monte Carlo confidence intervals for r and r' and the 
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results SW and SW for the Shapiro- Wilk normality test (see e.g. Royston, 1995) on the 
distribution of r and r'\ a significant value for the test (one or more bullets in the tables) 
means non normality. 

The difference between Tables El and 01 is in the way the trees are grown. Usually 
"switched" trees (trees grown on the controls only) of Table OJmatch almost all the treated 
even at very small split. This is due to the fact that the control group is numerically bigger 
than the treated one, so that the switched tree has a number of leaves higher than the 
number of treated units and match frequently occurs one-to-one. Conversely, on "straight" 
trees (trees grown on the treated only) controls tend to group on leaves where only one 
treated is present, often causing the violation of the balancing property. In Table El one 
can see that when the percentage of trees that match at least 95% of treated is low, the 
estimators do not exhibit a good performance. So this percentage can be assumed as a 
quality indicator of the match. In principle, the evaluation of r requires to match all the 
treated units: a poor match, i.e. the systematical exclusion of some treated individuals 
from the matching process, forces the ATT estimator to neglect a part of the treatment 
effect and hence introduces a bias in the estimates. Notice that from this point of view 
our procedure is rather selective compared to DM methods or to some implementations of 
the PSM methods, because we require the balancing property to be met inside each leaf. 
Also note that the confidence intervals for r' are usually a bit smaller than the analogous 
for r, meaning that - despite the population variance is affected by the difference-in- 
covariates correction -this is not likely to happen for the standard error of the estimator. 
Remarkably, the results highly agree in both tables. 

5.4 Results on non- experimental data 

Table E] shows the results obtained by matching DW treated units and PSID controls. 
The use of non-experimental controls to evaluate the average effect of treatment of NSW 
treated workers has been attempted in many studies and the conclusion of Smith and 
Todd (2004a) is that low bias estimates cannot be achieved by matching estimators when 
using samples coming from different contexts (different labour markets, in this case) and 
relying on non-homogeneous measures of the outcome variable. Therefore, they conclude 
that NSW data and PSID cannot reach a good match. The same evidence arise in Abadie 
and Imbens (2004) where the authors try to match the same two samples. We obtain 
results qualitatively similar to these studies. 

As in Smith and Todd (2004a) and Abadie and Imbens (2004), the results in Table |U 
show a bad quality of the matching and the consequent unreliability of the ATT estimates. 
Differently to the quoted literature, the method proposed here gives a clear view that 
something is really wrong in matching these two samples (even in absence of a target). 

At first, notice that the trees that match more than 95% of the treated ("% o.t.") 
never exceed 59% of the total and that the average number of treated units matched 
in each tree is never higher than 177 (over 185 treated in DW sample). As previously 
discussed, having a higher number of leaves, the "switched" trees achieve better values 
of the "% o.t." indicator and match, on average, a higher number of treated per tree, 
together with a lower average number of controls. The average number of iterations - 
never lower than 35 and often equal to the maximum - confirms the difficulty of matching 
the two samples (see Table Q). Moreover, the number of iterations appears to be higher 
the lower is the split parameter, i.e. the lower is the "% o.t." indicator. 
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On the whole, all the indicators seem concordant in pointing out that DW treated 
and PSID controls cannot achieve a good quality match, and this should suggest that 
these two samples are not to be used to evaluate the average treatment effect through a 
matching estimator. Examining the values of f and f ' in Table 0] one can in fact observe 
that the estimated ATT is quite far from the naive target (1794), particularly before the 
difference- in-covariates correction. 

It is more considerable, however, that many features of the estimates would confirm 
this unreliability even in the absence of a benchmark. The main evidence is the remark- 
able difference between the ATT estimate before and after the difference-in-covariates 
adjustment: in our case the signs of f and f' are sistematically different. The difference- 
in-covariates adjustment brings the estimates closer to the target, especially when the 
ATT is evaluated from the "switched" trees: this indicates that, due to the difficulty in 
matching the two samples, the lack of balance in the covariates inside the leaves is still 
considerable when r is estimated. As one may expect (see E J4.2|) . when the two samples 
cannot be easily matched, the "switched" trees provide lower bias estimates. The esti- 
mated variance of the ATT is particularly high, both with trees grown on the treated and 
with trees grown on the controls. Also the variance of the estimator assumes high values, 
so that the confidence intervals we obtain include the target value of the ATT, at least 
after the difference-in-covariates adjustment (and with a partial exception when split=2), 
only due to the wideness of the intervals themselves. Lastly, the Shapiro- Wilk test shows 
that the estimators seldom have a normal distribution. 

6 Summary of the method 

In the previous section we showed the performance of the proposed tool in two typical situ- 
ations of average treatment effect estimation on experimental ( ffi>.H|) and non-experimental 
data f ^5.4|) . The results seem to be able to answer to the main questions: "are my data 
really comparable in terms of the matching procedures?"; if so, "can I reliably estimate 
the average treatment effect?" More importantly, both questions can be addressed even 
in absence of a known target and without needing to introduce model assumptions hardly 
justifiable in practical situations. We would like to summarize here the properties of the 
method and give an outline on how to use this tool in applied analyses. 

The characteristics of the method 

• the method, as desirable for any method, does not require to know the target in 
advance to provide conclusions/evidence on the data; 

• the method does not need to specify any additional model assumption; 

• the method is not sensitive to the only parameter (the split) the user can specify 
(this is actually the only, so to say, "assumption" we put on the complexity of the 
grown trees, i.e. on the model); 

• explicit indications of low quality of the match are given by the method, in terms 
of difficulty to match treated and controls (see Table and number of trees that 
match almost all (at least 95% of) the treated (see Tables El El EI) ; 
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• the estimating procedure provides information on the reliability of the estimates 
(i.e. when treated and controls are correctly matched the difference-in-covariate 
correction should not considerably affect the ATT estimate); 

• the procedure is available to be used as an open source free software and can be 
easily implemented for other software environments like Stata, Matlab etc. 

• the method has at least one limitation: no formal theory is available yet. Con- 
versely, one can think of our approach as a non-parametric stratified permutation 
test approach to the problem of average treatment effect estimation. This seems to 
be a promising way - still under investigation - for proving formal properties of the 
ATT estimator. 

• even if not stressed in the text, CART is a method known to be robust to missing val- 
ues in the dataset and this can be a important advantage in some non-experimental 
situations. 

How to use the procedure 

We conclude this section illustrating a sort of step-by-step guide, in algorithmic form, for 
using the method in applied research. 

Step 0. initialization: choose k, the number of replications, to be at least 100, and 
set the flag iter to FALSE; 

Step 1. run k replications using a maximum number of 50 iterations on the straight 
and switched trees. Let the split vary from, say, 50 to 2. 

Step 2. observe the following indexes for the different values of the split parameter: 
a) the number of partitions/replications that match at least 95% of the treated; b) 
the average number of iterations per replication; c) the values of f and f'; 

Step 3. Three cases may happen: 

Case i: a) is low, the match is not successfully realized. In this case you should 
notice high values in b) and a substantial effect on the difference-in-covariates 
adjustment in c). Match is not the tool to analyze this dataset. Stop. 

Case ii: a) is high, b) is low and c) present stable results then c) gives you 
reliable informations on the true value of the ATT. 

• If iter=FALSE set it to TRUE. Go to Step 1 using a higher number k of 
replications to obtain more precise ATT estimates (for example in terms of 
confidence intervals). You don't need to grow both straight and switched 
trees at this point: just build the trees on the subset with the higher 
number of observations. You can also drop values of the split (usually the 
low values) corresponding to lower values of a). 

• If iter=TRUE then you can rely on the ATT estimates. Stop 

Case in: you cannot draw sharp conclusions observing the values in a), b) and 
c) for different values of the split. 
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• If iter=FALSE set it to TRUE, go to Step 1 using a higher number k of 
replications, possibly operating a selection on the split values. 

• If iter=TRUE then you have no reliable ATT estimates: matching is not 
the tool to analyze this dataset. Stop. 

Given the random nature of the method, one iteration (iter=TRUE) of the protocol with 
an increased number k of replications is always recommendable to eliminate the influence 
of the initial seed of the random generator used. As already mentioned in the text, with 
k=250 replications and a maximum number of iterations per tree equal to 50, the total 
number of trees generated (i.e. the total number of partitions explored) results to be 
appreciably high, up to 12500. 

7 The proximity matrix and a related simple ATT 
estimator 

The proximity matrix P defined in (jHJ) summarizes the number of matches realized along 
the R replications. The scalar of P reports the number of matches involving treated 
% and control j. The graphical representation of P allows for an easy inspection of the 
quality of the match. Figure is a representation of matrix P for split values 2, 16 
and 50 for the LL experimental sample. Treated units are represented on the y axis 
while the control units are on the x axis. The intensity of the spots is proportional to 
the corresponding p^: the darker the spot, the higher the number of times the match 
between treated i and control j is realized. Given a split=2, paired samples are expected 
to return an image with one spot per line. As long as a treated units matches more than 
a single control, the corresponding line reveals several points. This occurs either when 
imposing larger values of the split parameter (spurious matches) or when using unpaired 
samples (a treated has more than one truly similar individual among the controls), or 
for both reasons. Therefore, the two conditions for a good match are: a) having at least 
one spot per line and b) having spots as dark as possible. Images with many faint spots 
and few dark spots are the result of a low quality match realization. When samples can 
be successfully matched, reducing the split value should generate a cleaner and sharper 
image, as spurious matches are removed when the split decreases and the darkness of the 
spots representing true matches is not affected. 

Figure El reports a comparison between the proximity matrix for the DW experimen- 
tal data (upper image) and the DWvsPSID non experimental data (middle image) with 
split=50. This value of the split allows for the highest number of matches for the DWvsP- 
SID sample (see Table HJ), including spurious ones. As one can see, the image for DW data 
is evidence of a good quality matching. On the contrary, the image for the DWvsPSID 
data has only few and faint spots, therefore showing that the two groups (treated/controls) 
in this non experimental sample cannot be successfully matched. Notice that upper and 
middle images have different x-axis length. One might be tempted to impute to this fact 
the different intensity in the two images 2 . To avoid any misunderstanding, the upper 
image has been rescaled to the same number of points as the middle image. The result, 
reported in the bottom picture of Figure El proves that the evidence of the matching 
quality does not vanish because of rescaling. 

2 In fact, the pixels in the upper image are wider than the ones in the middle image. 
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This graphical analysis of P is coherent with the analysis based on the indexes used 
in previous sections to asses the quality of a match. 

Once the proximity matrix has been introduced, it seems natural to introduce an ATT 
estimator based on P defined as follows 



with Ci = {j : pij > 0}. The evaluation of f (and his corrected version f') in our datasets 
is included in Tables El El ID As one can see the results are different (although slightly) 
from those reported in Tables El El and 01 In fact, the two estimators f and f coincide 
only if 



which is true, for instance, with paired sample and split parameter equal to 2. Also the 
estimates of the variance are affected by the same difference in the weights. 

Concluding remarks 

The method we propose to attain a match directly partitioning the covariates space instead 
of using a score or a distance approach, seems to be able to discriminate whether two 
samples can be matched or not, and if the answer is positive, the method provides reliable 
estimates of the average treatment effect. Although no formal theory has been developed 
yet, these computational results are promising. The idea of representing the match using 
the proximity matrix is an additional fruitful tool to understand when a good match 
cannot be obtained (and hence alternative methods should be used to evaluate the effect 
of a treatment) and also to see where it failed (as it is clear from the pictures which 
treated units have been matched and which have not). The random recursive partitioning 
method can be used to accomplish several other tasks, like outliers analysis and clustering 
classification. Large sample properties of our ATT estimators can be derived as in Abadie 
and Imbens (2004). 
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split 


T 


<7 T 


f' 




T 


C 


% o.t. 


SW 


SW 


95% C.I.(r) 


95% C.I.(r') 


Lalonde 


50 


822.9 


492.5 


958.2 


644.1 


297 


425 


- 






(705.0 


948.1) 


(880.0 : 


1058.2) 


297 treated 


32 


824.1 


494.2 


962.3 


648.2 


297 


425 


- 






(682.4 


970.9) 


(859.0 : 


1064.8) 


425 controls 


20 


820.4 


495.7 


956.3 


653.6 


296 


424 


- 






(643.9 


993.1) 


(805.3 : 


1103.0) 


naive targets: 


16 


820.1 


498.6 


955.1 


659.1 


296 


424 


- 




• 


(628.1 : 


1033.7) 


(807.7 : 


1132.0) 


ATT=886, 


8 


803.4 


501.7 


940.4 


673.5 


290 


424 


90.8 






(534.7 : 


1097.7) 


(728.5 : 


1218.3) 


SD=488 


4 


755.6 


463.4 


907.3 


658.3 


267 


423 


1.2 


• 


• 


(432.2 : 


1150.6) 


(598.7 : 


1282.2) 




2 


875.6 


253.2 


1070.8 


535.4 


235 


416 


0.0 






(505.3 : 


1299.7) 


(714.3 : 


1476.3) 


Dehejia- 


50 


1741.3 


681.1 


1718.0 


680.4 


185 


260 


- 




• • • 


(1604.0 


1880.6) 


(1601.7 


: 1806.0) 


Wahba 


32 


1763.1 


681.7 


1741.2 


862.7 


185 


260 


- 


• 




(1563.4 


1917.5) 


(1575.8 


: 1887.1) 


185 treated 


20 


1767.6 


685.9 


1753.3 


827.9 


185 


260 


99.6 


• • • 


• 


(1541.4 


1979.2) 


(1551.1 


: 1938.8) 


260 controls 


16 


1784.0 


685.4 


1777.8 


876.1 


184 


260 


99.6 






(1525.5 


2052.6) 


( 1546.5 


: 1991.4) 


naive targets: 


8 


1750.0 


691.2 


1757.6 


895.4 


180 


260 


88.4 




• 


(1420.5 


2124.1) 


(1466.1 


: 2130.9) 


ATT=1794, 


4 


1691.9 


641.9 


1730.6 


870.4 


163 


259 


1.2 


• 




(1298.2 


2157.5) 


(1336.5 


: 2140.1) 


SD=670 


2 


1797.3 


395.9 


1919.8 


719.7 


141 


254 


0.0 






(1288.7 


2326.1) 


(1425.0 


: 2473.4) 


Smith- Todd 


50 


2651.1 


1021.1 


2889.0 


1243.0 


108 


142 


- 


• • • 


• • • 


(2481.7 


2873.0) 


(2707.0 


: 3019.0) 


108 treated 


32 


2657.2 


1027.0 


2905.2 


1253.6 


108 


142 




• • • 


• • • 


(2408.1 : 


2876.0 ) 


(2716.7 


: 3064.2) 


142 controls 


20 


2646.6 


1029.7 


2905.7 


1264.4 


108 


142 


98.8 




• • • 


(2340.4 


2934.4) 


(2673.0 


: 3138.9) 


naive targets: 


16 


2648.8 


1036.1 


2910.5 


1277.9 


108 


142 


98.0 






(2295.7 


2967.6) 


(2610.3 


: 3182.7) 


ATT=2748, 


8 


2654.9 


1051.8 


2912.2 


1308.3 


105 


142 


79.2 






(2096.4 


3273.9) 


(2410.3 


: 3413.7) 


SD=1005 


4 


2705.0 


976.1 


2945.5 


1261.8 


96 


141 


10.4 






(1997.9 


3524.4) 


(2401.9 


: 3625.2) 




2 


2680.2 


319.8 


3014.0 


859.8 


82 


137 


0.0 






(1858.0 


3499.3) 


(2181.6 


: 3917.3) 



Table 2: The results of random trees built on the treated. Average values over 250 replications, f and a T are the estimators of r and its 
standard deviation in the population (f and a T > are the difference-in-covariate corrected versions, see Jl}. T and C are respectively the average 
number of treated and controls matched per tree. The percentage of trees ("-" = 100%) that match at least 95% of the treated is reported in 
column "% o.t.". SW and SW report the results of the Shapiro- Wilk test for normality respectively for f and f: the bullets (•) mean that the 
hypothesis of normality is rejected at the corresponding level (• = 0.05, •• = 0.01, • • • = 0.001). The last two columns are the 95% Monte 
Carlo confidence intervals for r and r'. 





split 


f 


a T 


t' 


(T T i 


T 


C % o.t. 


SW 


SW 


95% C.I.(r) 


95% C.L(t') 


Lalonde 


50 


820.7 


493.2 


944.0 


645.5 


297 


424 


• 


• 


(692.7 


955.7) 


(834.6 : 


1062.0) 


297 treated 


32 


823.6 


494.5 


940.9 


648.7 


297 


424 






(667.0 


956.4) 


(811.7 : 


1051.7) 


425 controls 


20 


824.8 


498.2 


935.8 


655.9 


297 


421 


• 




(667.6 


992.6) 


(790.1 : 


1096.6) 


naive targets: 


16 


836.7 


499.1 


949.2 


659.7 


297 


417 






(638.0 : 


1022.1 ) 


(763.4 : 


1114.7) 


ATT=886, 


8 


852.6 


501.9 


958.2 


673.8 


297 


388 






(588.0 : 


1145.3) 


(702.9 : 


1203.5) 


SD=488 


4 


849.8 


475.6 


950.1 


679.1 


297 


322 






(467.5 : 


1230.9) 


(570.7 : 


1316.7) 




2 


790.5 


306.8 


849.0 


610.5 


295 


247 






(292.9 : 


1299.4) 


(375.8 : 


1344.4) 


Dehejia- 


50 


1733.3 


682.9 


1701.4 


860.3 


185 


260 


• • • 


• • • 


1571.7 : 


1867.7) 


(1558.4 


: 1827.8) 


Wahba 


32 


1759.1 


684.2 


1730.9 


865.5 


185 


259 


• • • 


• • • 


1580.7 : 


1939.4) 


(1568.3 


: 1897.1) 


185 treated 


20 


1773.3 


688.6 


1753.0 


873.4 


185 


259 


• 




1531.7 : 


1968.6) 


(1539.7 


: 1938.6) 


260 controls 


16 


1785.8 


688.7 


1761.0 


877.5 


185 


256 






(1562.2 


2015.3) 


(1567.6 


: 1954.7) 


naive targets: 


8 


1808.4 


692.4 


1776.1 


895.3 


185 


239 






(1525.5 


2123.5) 


(1535.8 


: 2059.3) 


ATT=1794, 


4 


1826.7 


651.3 


1772.4 


891.1 


185 


200 






(1383.7 


2267.6) 


(1399.6 


: 2187.9) 


SD=670 


2 


1770.2 


474.7 


1620.2 


801.8 


183 


164 




• 


(1221.0 


2407.0) 


(1171.7 


: 2228.4) 


Smith- Todd 


50 


2636.6 


1020.5 


2869.1 


1243.1 


108 


142 




• 


(2467.1 


2855.9) 


(2740.4 


: 2995.1) 


108 treated 


32 


2622.6 


1028.7 


2857.7 


1255.8 


108 


142 






(2386.7 


2914.1) 


(2629.3 


: 3062.1) 


142 controls 


20 


2859.4 


1044.3 


2834.0 


1277.5 


108 


140 






(2279.3 


2884.0) 


(2570.7 


: 3087.8) 


naive targets: 


16 


2622.6 


1043.0 


2850.2 


1281.5 


108 


138 




• 


(2265.7 


2966.1) 


(2570.3 


: 3202.6) 


ATT=2748, 


8 


2859.3 


1055.2 


2836.6 


1315.6 


108 


125 






(2108.9 


3155.9) 


(2355.0 


: 3310.1) 


SD=1005 


4 


2620.6 


1001.3 


2756.9 


1307.7 


108 


101 






(1801.1 


2258.4) 


(2148.7 


: 3412.9) 




2 


2709.2 


598.1 


2691.9 


1079.2 


107 


75 






(1809.2 


3724.0) 


(1862.1 


: 3548.2) 



Table 3: The results of random trees built on the controls. Average values over 250 replications, f and a r are the estimators of r and its 
standard deviation in the population (f and a T > are the difference-in-covariate corrected versions, see fJQ|. T and C are respectively the average 
number of treated and controls matched per tree. The percentage of trees ("-" = 100%) that match at least 95% of the treated is reported in 
column "% o.t.". SW and SW report the results of the Shapiro- Wilk test for normality respectively for f and f: the bullets (•) mean that the 
hypothesis of normality is rejected at the corresponding level (• = 0.05, •• = 0.01, • • • = 0.001). The last two columns are the 95% Monte 
Carlo confidence intervals for r and r'. 



split 


f 


cr T 


f 


oy 


T 


C 


% o.t. 


sw sw 


95% C.I.(r) 


95% C.I.(r') 


50 


-1931.1 


1357.4 


557.2 


1848.5 


175 


2398 


54.0 


• • • • • • 


(-3645.7 : 312.1) 


(-669.4 


2432.8) 


32 


-1663.9 


1198.2 


806.4 


1697.6 


169 


2359 


26.0 


• • • • • • 


(-3200.9 : 254.2) 


(-296.3 


2191.2) 


20 


-1749.8 


1117.2 


752.7 


1610.6 


161 


2323 


2.4 




(-3465.2 : 390.9) 


(-463.5 


2314.9) 


16 


-1777.2 


1058.0 


789.2 


1569.5 


159 


2306 


1.2 




(-3354.4 : -71.2) 


(-414.4 


1997.4) 


8 


-2127-1 


938.1 


755.5 


1439.9 


148 


2221 


0.0 




(-3884.0 : -351.4) 


(-302.5 


1848.2) 


4 


-2604.8 


761.3 


813.5 


1280.9 


146 


1915 


0.0 


• 


(-4487.6 : -780.0) 


(-270.0 


2138.5) 


2 


-2736.9 


491.3 


612.2 


1139.5 


146 


349 


0.0 




(-3940.6 : -1269.3) 


(-417.0 


1697.7) 



50 


-692.4 


1077.0 


1049.7 


1657.8 


176 


552 


58.4 


• • • 


(-2271.0 : 


1255.0) 


(5.1: 


2318.1) 


32 


-619.3 


1072.2 


984.4 


1636.7 


177 


449 


58.4 


• • • • • • 


(-2364.2 : 


1222.4) 


(-208.2 


: 2401.3) 


20 


-392.9 


1034.6 


1058.3 


1588.2 


177 


359 


54.4 


• • • • • • 


(-1947,2 : 


1425.2) 


(-39.9 


2472.2) 


16 


-295.2 


1033.0 


1065.1 


1586.1 


177 


321 


55.2 


• • • • • • 


(-1583.8 : 


1398.2) 


(34.9 : 


2466.6) 


8 


-64.7 


977.7 


1263.8 


1572.6 


177 


243 


51.2 


• • • 


(-1481.8 : 


1373.7) 


(221.5 


: 2499.6) 


4 


-877.6 


997.7 


1211.7 


1643.1 


176 


178 


52.4 


• 


(-2610.1 


: 741.2) 


(80.9 : 


2572.7) 


2 


-323.4 


534.5 


1287.6 


1452.1 


166 


128 


6.4 




(-1663.9 : 


1101.4) 


(180.0 


: 2494.1) 



Table 4: Random tree results from the tree built on the treated (up) and on the controls (down) for the DW versus PSID sample. Average values 
over 250 replications, f and <5> are the estimators of r and its standard deviation in the population (r' and a T > are the difference-in-covariate 
corrected versions, see T and C are respectively the average number of treated and controls matched per tree. The percentage of trees 
that match at least 95% of the treated is reported in column "% o.t.". SW and SW report the results of the Shapiro- Wilk test for normality 
respectively for f and f': the bullets (•) mean that the hypothesis of normality is rejected at the corresponding level (• = 0.05, •• = 0.01, 
• • • = 0.001). The last two columns are the 95% Monte Carlo confidence intervals for r and r'. 





split 


f 


o T 


f 


cr T , 


T 


C 


% o.t. 


Lalonde 


50 


866.0 


402.8 


974.7 


423.5 


297 


425 




297 treated 


32 


849.5 


403.2 


965.5 


423.9 


297 


425 




425 controls 


20 


848.8 


403.8 


966.1 


424.6 


296 


424 


- 


naive targets: 


16 


851.1 


404.2 


966.1 


425.1 


296 


424 


- 


ATT=886, 


8 


841.9 


406.4 


962.5 


427.4 


290 


424 


90.8 


SD=488 


4 


829.4 


409.7 


956.3 


431.1 


267 


423 


1.2 




2 


851.1 


420.5 


1006.0 


422.9 


235 


416 


0.0 


Dehejia- 


50 


1766.2 


579.7 


1794.0 


599.5 


185 


260 


_ 


Wahba 


32 


1759.6 


580.1 


1708.7 


600.0 


185 


260 


_ 


185 treated 


20 


1737.7 


581.0 


1708.6 


600.9 


185 


260 


99.6 


260 controls 


16 


1742.0 


581.5 


1727.2 


601.5 


184 


260 


99.6 


naive targets: 


8 


1725.8 


584.5 


1730.6 


604.7 


180 


260 


88.4 


ATT=1794, 


4 


1685.3 


589.6 


1685.4 


610.3 


163 


259 


1.2 


SD=670 


2 


1720.8 


603.4 


1756.5 


625.2 


141 


254 


0.0 


Smith- Todd 


50 


2718.2 


870.7 


2873.3 


914.4 


108 


142 




108 treated 


32 


2692.8 


871.4 


2876.4 


915.2 


108 


142 




142 controls 


20 


2655.3 


872.3 


2875.4 


916.3 


108 


142 


98.8 


naive targets: 


16 


2662.7 


872.9 


2879.0 


917.0 


108 


142 


98.0 


ATT=2748, 


8 


2627.0 


875.7 


2884.6 


920.3 


105 


142 


79.2 


SD=1005 


4 


2620.7 


879.2 


2882.4 


924.4 


96 


141 


10.4 




2 


2560.8 


898.1 


2881.1 


946.6 


82 


137 


0.0 



Table 5: The results of random trees built on the treated over 250 replications, f and a T are 
the estimators of r and its standard deviation in the population (f ' and a T < are the difference- 
in-covariate corrected versions, see JQ). T and C are respectively the average number of treated 
and controls matched per tree. The percentage of trees ("-" = 100%) that match at least 95% 
of the treated is reported in column "% o.t.". 



19 





split 


f 


a T 


f 


cr T , 


T 


C % o.t. 


Lalonde 


50 


842.5 


403.1 


947.5 


423.8 


297 


424 


297 treated 


32 


847.3 


403.7 


949.4 


424.5 


297 


424 


425 controls 


20 


836.3 


404.6 


943.3 


425.5 


297 


421 


naive targets: 


16 


838.2 


405.2 


949.5 


426.2 


297 


417 


ATT=886, 


8 


856.8 


408.4 


962.4 


429.6 


297 


388 


SD=488 


4 


862.8 


413.4 


961.5 


435.2 


297 


322 




2 


892.6 


428.9 


903.8 


452.1 


295 


247 


Dehejia- 


50 


1753.6 


580.1 


1702.3 


600.0 


185 


260 


Wahba 


32 


1758.9 


580.8 


1708.0 


578.4 


185 


259 


185 treated 


20 


1764.8 


582.0 


1737.0 


602.0 


185 


259 


260 controls 


16 


1790.6 


582.8 


1758.6 


602.9 


185 


256 


naive targets: 


8 


1801.0 


587.0 


1760.0 


607.5 


185 


239 


ATT=1794, 


4 


1818.2 


595.0 


1781.9 


616.1 


185 


200 


SD=670 


2 


1845.6 


613.7 


1644.9 


636.4 


183 


164 


Smith- Todd 


50 


2671.5 


871.1 


2862.3 


914.9 


108 


142 


108 treated 


32 


2627.3 


871.8 


2830.1 


915.8 


108 


142 


142 controls 


20 


2603.4 


872.1 


2825.4 


917.3 


108 


140 


naive targets: 


16 


2628.7 


874.0 


2851.2 


918.3 


108 


138 


ATT=2748, 


8 


2610.5 


877.5 


2846.7 


922.5 


108 


125 


SD=1005 


4 


2589.7 


883.4 


2750.4 


929.4 


108 


101 




2 


2883.2 


913.2 


2787.4 


912.0 


107 


75 



Table 6: The results of random trees built on the controls over 250 replications, f and a T are 
the estimators of r and its standard deviation in the population (f ' and a T < are the difference- 
in-covariate corrected versions, see T and C are respectively the average number of treated 
and controls matched per tree. The percentage of trees ("-" = 100%) that match at least 95% 
of the treated is reported in column "% o.t.". 
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split 


f 


ov 


f 


a T i 


T 


C 


% o.t. 


50 


-3592.2 


597.6 


747.6 


665.5 


175 


2398 


54.0 


32 


-3381.2 


602.1 


980.5 


671.8 


169 


2359 


26.0 


20 


-3599.8 


606.9 


937.7 


678.4 


161 


2323 


2.4 


16 


-3756.8 


612.6 


1100.8 


686.6 


159 


2306 


1.2 


8 


-4441.8 


627.3 


1126.8 


706.5 


148 


2221 


0.0 


4 


-6516.3 


620.7 


1066.1 


697.5 


146 


1915 


0.0 


2 


-3380.3 


658.7 


906.9 


11739.9 


146 


349 


0.0 


split 


f 


a T 


f 


a T i 


T 


C 


% o.t. 


50 


-649.3 


594.7 


1162.8 


659.8 


176 


552 


58.4 


32 


-428.2 


593.3 


1131.3 


657.3 


177 


449 


58.4 


20 


-137.7 


596.0 


1162.7 


661.3 


177 


359 


54.4 


16 


-5.2 


598.5 


1210.8 


664.8 


177 


321 


55.2 


8 


-42.7 


618.9 


1298.2 


693.5 


177 


243 


51.2 


4 


-1457.0 


643.0 


1263.0 


728.4 


176 


178 


52.4 


2 


-1167.5 


731.8 


1326.4 


842.8 


166 


128 


6.4 



Table 7: Random tree results over 250 replications from the tree built on the treated (up) and 
on the controls (down) for the DW versus PSID sample, f and a T are the estimators of r and 
its standard deviation in the population (f and <5y are the difference-in-covariate corrected 
versions, see JQ). T and C are respectively the average number of treated and controls matched 
per tree. The percentage of trees that match at least 95% of the treated is reported in column 
"% o.t.". 
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Figure 1: Proximity matrix for random partitions built on the controls. LL experimental dataset 
for different split values. Large values in the split parameter produce spurious matches that tend 
to vanish as the split values decreases. In the last matrix, only "true" matches survive. 
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Figure 2: Proximity matrix for random partitions built on the controls. DW experimental 
dataset (up) and DWvsPSID non experimental dataset (middle). The bottom image is the 
same as the top one with a 2490-points x axis (as in the middle image). The middle image 
contains few faint spots, contrary to the top image that shows the good quality of the match 
that can be achieved for the DW experimental data. For further details see JJ7| 
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